CHRIST (Deemed to University), Bangalore

DEPARTMENT OF COMPUTER SCIENCE

School of Business and Management

Syllabus for
Master of Science (Data Science)
Academic Year  (2023)

 
1 Semester - 2023 - Batch
Course Code
Course
Type
Hours Per
Week
Credits
Marks
MDS131 RESEARCH METHODS IN DATA SCIENCE Core Courses 5 4 100
MDS132 PROBABILITY AND DISTRIBUTION THEORY Core Courses 5 4 100
MDS133 MATHEMATICAL FOUNDATIONS FOR DATA SCIENCE-I Core Courses 4 3 100
MDS151 APPLIED EXCEL Discipline Specific Elective Courses 3 1 50
MDS161A PRINCIPLES OF PROGRAMMING Discipline Specific Elective Courses 3 2 50
MDS161B INTRODUCTION TO PROBABILITY AND STATISTICS Discipline Specific Elective Courses 3 2 50
MDS161C LINUX ESSENTIALS Discipline Specific Elective Courses 3 2 50
MDS171 PROGRAMMING USING PYTHON Core Courses 8 5 150
2 Semester - 2023 - Batch
Course Code
Course
Type
Hours Per
Week
Credits
Marks
MDS231 DESIGN AND ANALYSIS OF ALGORITHMS Core Courses 4 3 100
MDS232 MATHEMATICAL FOUNDATIONS FOR DATA SCIENCE-II Core Courses 3 3 100
MDS271 DATABASE TECHNOLOGIES Core Courses 4 4 100
MDS272 INFERENTIAL STATISTICS USING R Core Courses 7 4 100
MDS273 FULL STACK WEB DEVELOPMENT Core Courses 7 4 100
3 Semester - 2022 - Batch
Course Code
Course
Type
Hours Per
Week
Credits
Marks
MDS311 PROGRAMMING FOR DATA SCIENCE IN R Core Courses 2 2 50
MDS331 NEURAL NETWORKS AND DEEP LEARNING Core Courses 4 4 100
MDS341A TIME SERIES ANALYSIS AND FORECASTING TECHNIQUES Discipline Specific Elective Courses 4 4 100
MDS341B BAYESIAN INFERENCE Discipline Specific Elective Courses 4 4 100
MDS341C ECONOMETRICS Discipline Specific Elective Courses 4 4 100
MDS341D BIO-STATISTICS Discipline Specific Elective Courses 4 4 100
MDS342C STOCHASTIC PROCESSES - 4 3 100
MDS371 CLOUD ANALYTICS Core Courses 6 5 150
MDS372 BUSINESS INTELLIGENCE Core Courses 5 4 4
MDS373A NATURAL LANGUAGE PROCESSING Discipline Specific Elective Courses 6 5 150
MDS373B HADOOP Discipline Specific Elective Courses 6 5 150
MDS373C BIO INFORMATICS Discipline Specific Elective Courses 6 5 150
MDS373D EVOLUTIONARY ALGORITHMS Discipline Specific Elective Courses 6 5 150
MDS373E OPTIMIZATION TECHNIQUE Discipline Specific Elective Courses 6 5 150
MDS381 SPECIALIZATION PROJECT Core Courses 4 2 100
4 Semester - 2022 - Batch
Course Code
Course
Type
Hours Per
Week
Credits
Marks
MDS481 INDUSTRY PROJECT Core Courses 2 12 300
    

    

Introduction to Program:

Data Science is popular in all academia, business sectors, and research and development to makeeffective decision in day to day activities. MSc in Data Science is a two year programme with six trimesters. This programme aims to provideopportunity to all candidates to master the skill setsspecific to data science with research bent. The curriculum supports the students to obtain adequateknowledge in theory of data science with hands on experience in relevant domains and tools. Candidategains exposure to research models and industry standard applications in data science through guestlectures,seminars,projects,internships,etc.

Programme Outcome/Programme Learning Goals/Programme Learning Outcome:

PO1: Problem Analysis and Design: Ability to identify analyze and design solutions for data science problems using fundamental principles of mathematics, Statistics, computing sciences, and relevant domain disciplines.

PO2: Enhance disciplinary competency and employability: Acquire the skills in handling data science programming tools towards problem solving and solution analysis for domain specific problems.

PO3: Societal and Environmental Concern: Utilize the data science theories for societal and environmental concerns.

PO4: Professional Ethics: Understand and commit to professional ethics and professional computing practices to enhance research culture and uphold the scientific integrity and objectivity.

PO5: Individual and Team work: Function effectively as an individual and as a member or leader in diverse teams and in multidisciplinary environments.

PO6: Engage in continuous reflective learning in the context of technology advancement: Understand the evolving data and analysis paradigms and apply the same to solve the real life problems in the fields of data science.

Assesment Pattern

CIA - 50%

ESE - 50%

Examination And Assesments

Evaluation pattern for full CIA courses:

 

The “Theory and Practical” Type of courses offered in all UG/PG programmes will be considered as Full CIA courses.

 

For this type of courses, there is no exclusive Mid Semester Examination and End Semester Examination; instead there shall be a continuous evaluation during the semester as,

 

CAC – Continuous Assessment Component

Assessment components such as Hard copy / Soft copy Assignment, Quiz, Presentation, Video Making, MOOC, Project, Demonstration, Service Learning, etc

CAT – Continuous Assessment Test

A written / Lab test would be conducted on any working day

 

The total marks for the full CIA courses would vary based on the number of hours allocated in a week for the respective course. Out of the maximum marks allotted to the respective course, 50% marks will be considered as CIA and remaining 50% as ESE based on the combinations of the evaluation components (CAC and CAT) .

MDS131 - RESEARCH METHODS IN DATA SCIENCE (2023 Batch)

Total Teaching Hours for Semester:60
No of Lecture Hours/Week:5
Max Marks:100
Credits:4

Course Objectives/Course Description

 

 To assist students in planning and carrying out research work in the field of data science. The students are exposed to the basic principles, procedures and techniques of implementing a research project. The course provides a strong foundation for data science and the application area related to it. Students are trained to understand the underlying core concepts and the importance of ethics while handling data and problems in data science.

Course Outcome

CO1: Understand the essence of research and the importance of research methods and methodology

CO2: Explore the fundamental concepts of data science

CO3: Understand various machine learning algorithms used in data science process

CO4: Learn to think through the ethics surrounding privacy, data sharing and algorithmic decision making

CO5: Create scientific reports according to specified standards

Unit-1
Teaching Hours:12
Research Methodology
 

Introduction:

Objectivesof Research, Types of Research,Research Approaches, Significanceof Research, Research Methods versus Methodology. Defining research problem: Selecting the problem, Necessity of defining the problem, Techniques involved in defining a problem, Research Design: Different Research Designs, Basic Principles of Experimental Designs, Developing a Research Plan.

Unit-2
Teaching Hours:12
Sampling, Measurement and Scaling Techniques
 

Sampling: Steps in Sampling Design, Different Types of Sample Designs, Measurement and Scaling: Measurement in Research, Measurement Scales, Technique of Developing Measurement Tools, Scaling, Important Scaling Techniques

Unit-2
Teaching Hours:12
Introduction to Data Science
 

Definition – Big Data and Data Science Hype – Why data science – Getting Past the Hype – The Current Landscape – Who is a Data Scientist? - Data Science Process Overview – Defining goals – Retrieving data – Data preparation – Data exploration – Data modeling – Presentation.

Unit-3
Teaching Hours:12
Machine Learning
 

Machine learning – Modeling Process – Training model – Validating model – Predicting new observations – Supervised learning algorithms–Unsupervised learning algorithms.

Unit-4
Teaching Hours:12
Report Writing
 

Working with Literature: Importance, finding literature, Using the resources, Managing the literature, Keep track of references, Literature review. Scientific Writing and Report Writing: Significance, Steps, Layout, Types, Mechanics and Precautions, Latex: Introduction, Text, Tables, Figures, Equations, Citations, Referencing, and Templates (IEEE style), Paper writing for international journals, Writing scientific report.

Unit-5
Teaching Hours:12
Ethics in Research and Data Science
 

Research ethics, Data Science ethics – Doing good data science – Owners of the data - Valuing different aspects of privacy - Getting informed consent - The Five Cs – Diversity – Inclusion.

Text Books And Reference Books:
  1. Davy Cielen and Arno Meysman, Introducing Data Science. Simon and Schuster, 2016.
  2. M. Loukides, H. Mason, and D. Patil, Ethics and Data Science. O’Reilly Media, 2018.
  3. C. R. Kothari, Research Methodology Methods and Techniques. 3rd. ed. New Delhi: New Age International Publishers, Reprint 2014.
  4. Zina O’Leary, The Essential Guide of Doing Research. New Delhi: PHI, 2005 
Essential Reading / Recommended Reading
  1. Data Science from Scratch: First Principles with Python, Joel Grus, O’Reilly, 1st edition, 2015
  2. Doing Data Science, Straight Talk from the Frontline, Cathy O'Neil, Rachel Schutt,O’Reilly, 1st edition, 2013
  3. Mining of Massive Datasets, Jure Leskovec, Anand Rajaraman, Jeffrey David Ullman,Cambridge University Press, 2nd edition, 2014
  4. Sinan Ozdemir, Principles of Data Science learn the techniques and math you need to start making sense of your data. Birmingham Packt December , 2016.
  5. J. W. Creswell, Research Design: Qualitative, Quantitative, and Mixed Methods Approaches. 4thed. SAGE Publications, 2014.
  6. Kumar, Research Methodology: A Step-by-Step Guide for Beginners. 3rd. ed. Indian: PE, 2010.
Evaluation Pattern

CIA - 50%

ESE - 50%

MDS132 - PROBABILITY AND DISTRIBUTION THEORY (2023 Batch)

Total Teaching Hours for Semester:60
No of Lecture Hours/Week:5
Max Marks:100
Credits:4

Course Objectives/Course Description

 

Probability and probability distributions play an essential role in modeling data from the realworld phenomenon. This course will equip students with thorough knowledge in probability and various probability distributions and model real-life data sets with an appropriate probability distribution

Course Outcome

CO1: Describe random event and probability of events.

CO2: Identify various discrete and continuous distributions and their usage.

CO3: Evaluate condition probabilities and conditional expectations. greedy algorithm etc.

CO4: Apply Chebychevs inequality to verify the convergence of sequence in probability.

Unit-1
Teaching Hours:12
Descriptive Statistics and Probability
 

Descriptive Statistics and Probability Data – types of variables: numeric vs categorical - measures of central tendency – measures of dispersion - random experiment - sample space and random events – probability - probability axioms - finite sample space with equally likely outcomes - conditional probability - independent events - Baye’s theorem

Unit-2
Teaching Hours:12
Probability Distributions for Discrete Data
 

Probability Distributions for Discrete Data Random variable – data as observed values of a random variable - expectation – moments & moment generating function - mean and variance in terms of moments - discrete sample space and discrete random variable – Bernoulli experiment and Binary variable: Bernoulli and binomial distributions – Count data: Poisson distribution – over dispersion in count data: negative binomial distribution – dependent Bernoulli trails: hypergeometric distribution (mean and variances in terms of mgf).

Unit-3
Teaching Hours:12
Probability Distributions For Continuous Data
 

Probability Distributions For Continuous Data Continuous sample space - Interval data - continuous random variable – uniform distribution - normal distribution (Gaussian distribution) – modeling lifetime data: exponential distribution, gamma distribution, Weibull distribution (Applications in Data science).

Unit-4
Teaching Hours:12
Jointly Distributed Random Variables
 

Jointly Distributed Random Variables Joint distribution of vector random variables – joint moments – covariance – correlation - independent random variables - conditional distribution – conditional expectation - sampling distributions: chisquare, t, F (pdf’s & properties).

Unit-5
Teaching Hours:12
Limit Theorems
 

Limit Theorems

Chebychev’s inequality - weak law of large numbers (iid): examples - strong law of large numbers (statement only) - central limit theorems (iid case): examples.

Text Books And Reference Books:

[1] Introduction to the theory of statistics. A.M Mood, F.A Graybill and D.C Boes, Tata McGraw-Hill, 3rd Edition (Reprint), 2017.

[2] Introduction to probability models. Ross, Sheldon M. 12th Edition, Academic Press, 2019.

[3] Fundamentals of Applied Mathematics, S.C. Gupta and V.K. Kapoor (New Edition)

 

Essential Reading / Recommended Reading

 [1] A first course in probability. Ross, Sheldon, 10th Edition. Pearson, 2019. [

 [2] An Introduction to Probability and Statistics. V.K Rohatgi and Saleh, 3rd Edition, 2015 

Evaluation Pattern

CIA-50%

ESE-50%

MDS133 - MATHEMATICAL FOUNDATIONS FOR DATA SCIENCE-I (2023 Batch)

Total Teaching Hours for Semester:45
No of Lecture Hours/Week:4
Max Marks:100
Credits:3

Course Objectives/Course Description

 

Linear Algebra plays a fundamental role in the theory of Data Science. This course aims at introducing the basic notions of vector spaces and it’s spans and orthogonalization, linear transformation and the use of its matrix bijections in applications to Data Science.

Course Outcome

CO1: Understand the properties of Vector spaces

CO2: Use the properties of Linear Maps in solving problems on Linear Algebra

CO3: Demonstrate proficiency on the topics Eigenvalues, Eigenvectors and Inner Product Spaces

CO4: Apply mathematics for some applications in Data Science

Unit-1
Teaching Hours:9
INTRODUCTION TO VECTOR SPACES
 

Vector Spaces: Definition and properties, Subspaces, Sums of Subspaces, Null space , Column space, Direct Sums, Span and Linear Independence, Bases, dimension, rank.

Unit-2
Teaching Hours:9
LINEAR TRANSFORMATIONS
 

Algebra of Linear Transformations, Null spaces and Injectivity, Range and Surjectivity, Fundamental Theorems of Linear Maps- Cayley-Hamilton theorem -  Orthonormal basis. 

Unit-3
Teaching Hours:9
EIGENVALUES AND EIGENVECTORS
 

Invariant Subspaces, Polynomials applied to Operators – Upper-Triangular matrices, Diagonal matrices, Invariant Subspaces on real vector Spaces Eigen values and Eigen vectors – Characteristic equation – Diagonalization - Upper Triangular matrices -  Invariant Subspaces on Real Vector Spaces

Unit-4
Teaching Hours:9
INNER PRODUCT SPACES
 

Inner Products and Norms – Orthogonality - Orthogonal Bases – Orthogonal Projections –Gram-Schmidt process - Least square problems – Applications to Linear models 

Unit-5
Teaching Hours:9
BASIC MATRIX METHODS FOR APPLICATIONS
 

Matrix Norms –Singular value decomposition- Householder Transformation and QR decomposition- Non Negative Matrix Factorization – bidiagonalization

Text Books And Reference Books:

1. David C. Lay, Steven R. Lay, Judi J. McDonald (2016) Linear algebra and its applications. Pearson.

2. S. Axler, Linear algebra done right, Springer, 2017. 

3. Strang, G. (2006) Linear Algebra and its Applications: Thomson Brooks. Cole, Belmont, CA, USA.

Essential Reading / Recommended Reading

1. E. Davis, Linear algebra and probability for computer science applications, CRC Press, 2012.

2. J. V. Kepner and J. R. Gilbert, Graph algorithms in the language of linear algebra, Society for Industrial and Applied Mathematics, 2011. 

3. D. A. Simovici, Linear algebra tools for data mining, World Scientific Publishing, 2012.  

4. P. N. Klein, Coding the matrix: linear algebra through applications to computer science, Newtonian Press, 2015.

Evaluation Pattern

CIA - 50%

ESE - 50%

MDS151 - APPLIED EXCEL (2023 Batch)

Total Teaching Hours for Semester:30
No of Lecture Hours/Week:3
Max Marks:50
Credits:1

Course Objectives/Course Description

 

This course is designed to build logical thinking ability and to provide hands-on experience in solving statistical models using MS Excel with Problem based learning. To explore and visualize data using excel formulas and data analysis tools.

Course Outcome

CO1: Demonstrate the data management using excel features.

CO2: Analyze the given problem and solve using Excel.

CO3: Infer the building blocks of excel, excel shortcuts, sample data creation.

Unit-1
Teaching Hours:10
Layout and Properties
 

 File types - Spreadsheet structure - Menu bar - Quick access toolbar - Mini toolbar - Excel options - Formatting: Format painter - Font - Alignment - Number - Styles - Cells, Clear - Page layout Properties Symbols - Equation - Editing - Link - Filter - Charts - Formula Auditing - Overview of Excel tables and properties - Collecting sample data and arranging in definite format in Excel tables.

Lab :

1. Excel Formulas

2. Excel Tables and Properties

Unit-2
Teaching Hours:10
Files and Databases
 

Files

Importing data from different sources - Exporting data in different formats Database CO1 ,CO2 Creating database with the imported data - Data tools: text to column - identifying and removing duplicates - using format cell options

Lab:

5.Import data 6.Export data 7.Creating database 8.Data tools

Unit-3
Teaching Hours:10
Functions
 

Functions Application of functions - Concatenate - Upper - Lower - Trim - Repeat - Proper - Clean - Substitute - Convert - Left - Right - Mid - Len - Find - Exact - Replace - Text join - Value - Fixed etc. ,CO2, CO3

Lab:

9.Excel functions. 

Text Books And Reference Books:

 [1] Alexander R, Kuselika R and Walkenbach J, Microsoft Excel 2019 Bible, Wiley India Pvt Ltd, New Delhi, 2018.

 

Essential Reading / Recommended Reading

 

[1] Paul M, Microsoft Excel 2019 formulas and functions, Pearson Eduction, 2019 

Evaluation Pattern

CIA-50%

ESE-50%

MDS161A - PRINCIPLES OF PROGRAMMING (2023 Batch)

Total Teaching Hours for Semester:30
No of Lecture Hours/Week:3
Max Marks:50
Credits:2

Course Objectives/Course Description

 

The students shall be able to understand the main principles of programming. The objective also includes indoctrinating the activities of implementation of programming principles.

Course Outcome

CO1: Understand the fundamentals of programming languages.

CO2: Understand the design paradigms of programming languages.

CO3: To examine expressions, subprograms and their parameters.

Unit-1
Teaching Hours:10
Introduction to Syntax and Grammar
 

Introduction, Programming Languages, Syntax, Grammar, Ambiguity, Syntax and Semantics, Data Types (Primitive/Ordinal/Composite data types, Enumeration and sub-range types, Arrays and slices, Records, Unions, Pointers and pointer problems).

Unit-2
Teaching Hours:10
Constructing Expressions
 

Expressions, Type conversion, Implicit/Explicit conversion, type systems, expression evaluation, Control Structures, Binding and Types of Binding,Lifetime, Referencing Environment (Visibility, Local/ Nonlocal/ Global variables), Scope (Scope rules, Referencing operations, Static/Dynamic scoping).

Unit-3
Teaching Hours:10
Subprograms and Parameters
 

Subprograms, signature, Types of Parameters, Formal/Actual parameters, Subprogram overloading, Parameter Passing Mechanisms, Aliasing, Eager/Normal-order/Lazy evaluation) , Subprogram Implementation (Activation   record, Static/Dynamic chain, Staticchain method, Deep/Shallow access, Subprograms as parameters, Labels as parameters, Generic subprograms, Separate/Independent compilation).

Text Books And Reference Books:

1. Allen B. Tucker, Robert Noonan, Programming Languages: Principles and Paradigms, Tata McGraw Hill Education, 2006.

2. Bruce J. MacLennan, “Principles of Programming Languages: Design, Evaluation, and Implementation”, Third Edition, Oxford University Press (New York), 1999.

Essential Reading / Recommended Reading

1. T. W. Pratt, M. V. Zelkowitz, Programming Languages, Design and Implementation, Prentice Hall, Fourth Edition, 2001.

2. Robert Harper, Practical Foundations for Programming Languages, Second Edition, Cambridge University Press, 2016.

Evaluation Pattern

CIA - 50%

ESE - 50%

MDS161B - INTRODUCTION TO PROBABILITY AND STATISTICS (2023 Batch)

Total Teaching Hours for Semester:30
No of Lecture Hours/Week:3
Max Marks:50
Credits:2

Course Objectives/Course Description

 

This course is designed to introduce the historical development of statistics, presentation of data, descriptive measures and cultivate statistical thinking among students. This course also introduces the concept of probability. 

Course Outcome

CO1: Demonstrate, present and visualize data in various forms, statistically.

CO2: Understand and apply descriptive statistics.

CO3: Evaluation of probabilities for various kinds of random events

Unit-1
Teaching Hours:8
ORGANIZATION AND PRESENTATION OF DATA
 

Origin and development of Statistics - Scope - limitation and misuse of statistics - types of data: primary, secondary, quantitative and qualitative data - Types of Measurements: nominal, ordinal, ratio and scale - discrete and continuous data - Presentation of data by tables - graphical representation of a frequency distribution by histogram and frequency polygon - cumulative frequency distributions (inclusive and exclusive methods).

Unit-2
Teaching Hours:6
DESCRIPTIVE STATISTICS I
 

Measures of location or central tendency: Arithmetic mean - Median - Mode - Geometric mean - Harmonic mean.

Unit-3
Teaching Hours:6
DESCRIPTIVE STATISTICS II
 

Partition values: Quartiles - Deciles and Percentiles - Measures of dispersion: Mean deviation - Quartile deviation - Standard deviation - Coefficient of variation - Moments: measures of skewness - kurtosis

Unit-4
Teaching Hours:10
BASICS OF PROBABILITY
 

Random experiment - sample point and sample space – event - algebra of events - Definition of Probability: classical - empirical and axiomatic approaches to probability - properties of probability - Theorems on probability - conditional probability and independent events - Laws of total probability - Baye’s theorem and its applications.

Text Books And Reference Books:

1. David C. Lay, Steven R. Lay, Judi J. McDonald (2016) Linear algebra and its applications. Pearson. 2. S. Axler, Linear algebra done right, Springer, 2017.  

2. Strang, G. (2006) Linear Algebra and its Applications: Thomson Brooks. Cole, Belmont, CA, USA.

Essential Reading / Recommended Reading

1. E. Davis, Linear algebra and probability for computer science applications, CRC Press, 2012.

2. J. V. Kepner and J. R. Gilbert, Graph algorithms in the language of linear algebra, Society for Industrial and Applied Mathematics, 2011. 

3. D. A. Simovici, Linear algebra tools for data mining, World Scientific Publishing, 2012. 

4. P. N. Klein, Coding the matrix: linear algebra through applications to computer science, Newtonian Press, 2015.

Evaluation Pattern

CIA - 50%

ESE - 50%

MDS161C - LINUX ESSENTIALS (2023 Batch)

Total Teaching Hours for Semester:30
No of Lecture Hours/Week:3
Max Marks:50
Credits:2

Course Objectives/Course Description

 

This course is designed to introduce Linux working environment to students. This course will enable students to understand the Linux system architecture, File and directory commands and foundations of shell scripting.

Course Outcome

CO1: Demonstrate the Basic file, directory commands

CO2: Understand the Unix system environment

CO3: Apply shell programming concepts to solve given problem

Unit-1
Teaching Hours:10
Introduction
 

Introduction, Salient features, Unix system architecture,Unix Commands, Directory Related Commands, File Related Commands,Disk related Commands,General  utilities,Unix File System,Boot inode, super and data block ,in core structure,Directories, conversion of  path name to inode,   inode to new file,Disk block Allocation

Unit-2
Teaching Hours:10
Process Management
 

Process Management Process state and data structures of a Process,Context of a Process, background processes,User versus Kernel node,Process scheduling commands,. Process scheduling commands,Process terminating and examining commands,Secondary Storage Management: Formatting, making file system, checking disk space, mountable file system, disk partitioning

Unit-3
Teaching Hours:10
shell Programming
 

Shell Programming, Vi Editor,.Shell types, Shell command line processing, Shell script & its features, system and user defined variables, Executing shell scripts expr command Shell Screen Interface, read and echo statement,Shell Script arguments Conditional Control Structures – if statement,Case statement,Looping Control Structure – while,for,Jumping Control Structures – break, continue, exit.

 

Text Books And Reference Books:

[1] Linux: The Complete Reference, sixth edition, Richard Petersen, 2017

Essential Reading / Recommended Reading

[1] Linux Pocket Guide, Daniel J. Barrett,3rd edition, O’Reilly 

Evaluation Pattern

 

CIA 50% 

ESE 50%

MDS171 - PROGRAMMING USING PYTHON (2023 Batch)

Total Teaching Hours for Semester:90
No of Lecture Hours/Week:8
Max Marks:150
Credits:5

Course Objectives/Course Description

 

The objective of this course is to provide comprehensive knowledge of python programming paradigms required for Data Science.

Course Outcome

CO1: Demonstrate the use of built-in objects of Python.

CO2: Demonstrate significant experience with python program development environment

CO3: Implement numerical programming, data handling and visualization through NumPy, Pandas and MatplotLib modules.

Unit-1
Teaching Hours:18
INTRODUCTION TO PYTHON
 

 Python and Computer Programming - Using Python as a calculator - Python memory management - Structure of Python Program - Branching and Looping - Problem Solving Using Branches and Loops - Lists and Mutability - Functions - Problem Solving Using Lists and Functions.

Lab Exercises

1. Demonstrate usage of branching and looping statements

2. Demonstrate Recursive functions

3. Demonstrate Lists

Unit-2
Teaching Hours:18
SEQUENCE DATATYPES AND OBJECT ORIENTED PROGRAMMING
 

 Sequences, Mapping and Sets - Dictionaries - Classes: Classes and Instances -Inheritance - Exceptional Handling - Module: Built in modules & user defined module - Introduction to Regular Expressions using “re” module

Lab Exercises

1.Demonstrate Tuples, Sets and Dictionaries

2. Demonstrate inheritance and exception handling

3. Demonstrate use of “re” 

Unit-3
Teaching Hours:18
USING NUMPY
 

 Basics of NumPy - Computation on NumPy - Aggregations - Computation on Arrays- Comparisons, Masks and Boolean Arrays - Fancy Indexing-Sorting Arrays - Structured Data: NumPy’s Structured Array.

Lab Exercises

1. Demonstrate Aggregation

2. Demonstrate Indexing and Sorting

3. Demonstrate handling of missing data

4. Demonstrate hierarchical indexing

Unit-4
Teaching Hours:18
DATA MANIPULATION WITH PANDAS
 

 Introduction to Pandas Objects - Data indexing and Selection - Operating on Data in Pandas - Handling Missing Data - Hierarchical Indexing - Aggregation and Grouping - Pivot Tables - Vectorized String Operations - High Performance Pandas: and query().

Lab Exercises

1. Demonstrate usage of Pivot table

2. Demonstrate use of and query() 

Unit-5
Teaching Hours:18
VISUALIZATION WITH MATPLOTLIB
 

Basics of matplotlib - Simple Line Plot and Scatter Plot - Density and Contour Plots - Histograms, Binnings and Density - Customizing Plot Legends - Multiple subplots - Three- Dimensional Plotting in Matplotlib.

Lab Exercises

1. Demonstrate Line plot and Scatter plat

2. Demonstrate 3D plotting

Text Books And Reference Books:

[1] Jake VanderPlas ,Python Data Science Handbook - Essential Tools for Working with Data, O’Reily Media,Inc, 2016

[2] Zhang. Y, An Introduction to Python and Computer Programming, Springer Publications, 2016

Essential Reading / Recommended Reading

[1] JoelGrus, Data Science from Scratch First Principles with Python, O’Reilly, Media,2016

[2] T.R.Padmanabhan, Programming with Python, Springer Publications, 2016.M. Rajagopalan and P. Dhanavanthan- Statistical Inference-1st ed. - PHI Learning (P) Ltd.- New Delhi- 2012.

[3] V. K. Rohatgi and E. Saleh- An Introduction to Probability and Statistics- 3rd ed.- John Wiley & Sons Inc- New Jersey- 2015. 

Evaluation Pattern

CIA 50%

ESE 50%

MDS231 - DESIGN AND ANALYSIS OF ALGORITHMS (2023 Batch)

Total Teaching Hours for Semester:45
No of Lecture Hours/Week:4
Max Marks:100
Credits:3

Course Objectives/Course Description

 

 

The course introduces techniques for designing and analyzing algorithms and data structures. It concentrates on techniques for evaluating the performance of algorithms. The objective is to understand different designing approaches like greedy, divide and conquer, dynamic programming etc. for solving different kinds of problems.

Course Outcome

CO1: Understand basic techniques for designing algorithms, including the techniques of recursion, divide-and-conquer, greedy algorithm etc.

CO2: Understand the mathematical criterion for deciding whether an algorithm is efficient and know many practically important problems that do not admit any efficient algorithms.

CO3: Apply classical sorting, searching, optimization and graph algorithms.

CO4: Design new algorithms and analyze their asymptotic and absolute runtime and memory demands.

Unit-1
Teaching Hours:9
Introduction
 

Algorithms, Analyzing algorithms, Complexity of algorithms, Growth of functions, Performance measurements, Sorting and order Statistics - Shell sort, Heap sort, Sorting in linear time.

Unit-2
Teaching Hours:9
Advanced Data Structures
 

 

Red-Black trees, B – trees, Binomial Heaps, Fibonacci Heaps, Tries, skip list.

Unit-3
Teaching Hours:9
Divide and Conquer
 

Quick sort, Merge sort, Finding maximum and minimum,Matrix Multiplication, Searching. 

 

Greedy methods with examples such as Optimal Reliability Allocation, Knapsack, Minimum Spanning trees – Prim’s and Kruskal’s algorithms, Single source shortest paths - Dijkstra’s and Bellman Ford algorithms.Optimal merge patterns.

Unit-4
Teaching Hours:9
Dynamic Programming
 

 

Dynamic programming with examples such as Knapsack, All pair shortest paths – Warshal’s and Floyd’s algorithms, Resource allocation problem. Backtracking, Branch and Bound with examples such as Travelling Salesman Problem, Graph Coloring, n-Queen Problem, Hamiltonian Cycles and Sum of subsets

Unit-5
Teaching Hours:9
Unit V
 

Algebraic Computation, Fast Fourier Transform, String Matching, Theory of NP-completeness, Approximation algorithms and Randomized algorithms.

Text Books And Reference Books:

 [1]    Coreman, Rivest, Lisserson, “An Introduction to Algorithm”, PHI, 2001

 

 [2] Horowitz & SAHANI,” Fundamental of computer Algoritm”, Galgotia Publications, 2nd Edition.

Essential Reading / Recommended Reading

[1] Aho, Hopcraft, Ullman, “The Design and Analysis of Computer Algorithms” Pearson Ed9ucation, 2008.

 

[2]Donald E. Knuth, The Art of Computer Programming Volume 3, Sorting and Searching, 2nd Edition, Pearson Education, Addison-Wesley, 1998.

[3] GAV PAI, Data structures and Algorithms, Tata McGraw Hill, Jan 2008.

Evaluation Pattern

 

CIA 50% 

ESE 50%

MDS232 - MATHEMATICAL FOUNDATIONS FOR DATA SCIENCE-II (2023 Batch)

Total Teaching Hours for Semester:45
No of Lecture Hours/Week:3
Max Marks:100
Credits:3

Course Objectives/Course Description

 

This course aims at introducing data science related essential mathematics concepts such as fundamentals of topics on Calculus of several variables, Orthogonality, Convex optimization, and Graph Theory.

Course Outcome

CO1: Demonstrate the properties of multivariate calculus

CO2: Use the idea of orthogonality and projections effectively

CO3: Have a clear understanding of Convex Optimization

CO4: Know the about the basic terminologies and properties in Graph Theory

Unit-1
Teaching Hours:9
Calculus of Several Variables
 

Functions of Several Variables: Functions of two, three variables - Limits and continuity in Higher Dimensions: Limits for functions of two variables, Functions of more than two variables - Partial Derivatives: partial derivative of functions of two variables, partial derivatives of functions of more than two variables - The Chain Rule: chain rule on functions of two, three variables, chain rule on functions defined on surfaces

Unit-2
Teaching Hours:9
Orthogonality
 

Perpendicular vectors and Orthogonality - Inner Products and Projections onto lines - Projections of Rank one - Projections and Least Squares Approximations - Projection Matrices - Orthogonal Bases, Orthogonal Matrices.

Unit-3
Teaching Hours:9
Introduction to Convex Optimization
 

Affine and Convex Sets: Lines and Line segments, affine sets, affine dimension andrelative interior, convexsets, cones - Hyperplanes and half-spaces - Euclidean balls and ellipsoids- Norm balls and Norm cones – polyhedral.

Unit-4
Teaching Hours:9
Graph Theory - Basics
 

Graph Classes: Definition of a Graph and Graph terminology, isomorphism of graphs, Completegraphs, bipartite graphs, complete bipartite graphs-Vertex degree: adjacency and incidence, regular graphs - subgraphs, spanning subgraphs, induced subgraphs, removing or adding edges of a graph, removing vertices from graphs.

Unit-5
Teaching Hours:9
Graph Theory - More concepts
 

Matrix Representation of Graphs, Adjacency matrices, Incidence Matrices, Trees and its properties, Bridges (cut-edges), spanning trees, weighted Graphs, minimal spanning tree problems, Shortest path problems - Applications of Graph Theory

Text Books And Reference Books:

1] M D. Weir, J. Hass, and G. B. Thomas, Thomas' calculus. Pearson, 2016. (Unit 1)

[2] G Strang, Linear Algebra and its Applications, 4th ed., Cengage, 2006. (Unit 2)

[3] S. P. Boyd and L.Vandenberghe, Convex optimization.Cambridge Univ. Pr., 2011.(Unit 3) 

[4] J Clark, D A Holton, A first look at Graph Theory, Allied Publishers India, 1995. (Unit 4)

Essential Reading / Recommended Reading

[1] J. Patterson and A. Gibson, Deep learning: a practitioner's approach. O'Reilly Media, 2017

[2] S. Sra, S. Nowozin, and S. J. Wright, Optimization for machine learning. MIT Press, 2012

[3] D. Jungnickel, Graphs, networks and algorithms. Springer, 2014

[4] D Samovici, Mathematical Analysis for Machine Learning and Data Mining, World Scientific Publishing Co. Pte. Ltd, 2018

[5] P. N. Klein, Coding the matrix: linear algebra through applications to computer science. Newtonian Press, 2015 

[6] K H Rosen, Discrete Mathematics and its applications, 7th ed., McGraw Hill, 2016

Evaluation Pattern

 CIA 50% , ESE 50%

MDS271 - DATABASE TECHNOLOGIES (2023 Batch)

Total Teaching Hours for Semester:75
No of Lecture Hours/Week:4
Max Marks:100
Credits:4

Course Objectives/Course Description

 

The main objective of this course is to fundamental knowledge and practical experience with, database concepts. It includes the concepts and terminologies which facilitate the construction of relational databases, writing effective queries comprehend data warehouse and NoSQL databases and its types

Course Outcome

CO1: Demonstrate various databases and compose effective queries

CO2: Understanding the process of OLAP system construction

CO3: Develop applications using Relational and NoSQL databases

Unit-1
Teaching Hours:15
Introduction
 

Concept & Overview of DBMS, Data Models, Database Languages, Database Administrator, Database Users, Three Schema architecture of DBMS. Basic concepts, Design Issues, Mapping Constraints, Keys, Entity-Relationship Diagram, Weak Entity Sets, Extended E-R features.

 

Lab Exercises

1. Data Definition,

 

2. Table Creation

 

Unit-2
Teaching Hours:12
Relational model and database design
 

 SQL and Integrity Constraints, Concept of DDL, DML, DCL. Basic Structure, Set operations,   

 Aggregate Functions, Null Values, Domain Constraints, Referential Integrity Constraints, 

 assertions, views, Nested Subqueries, Functional Dependency, Different anomalies in designing a Database, Normalization: using functional dependencies, Boyce-Codd Normal Form.

Lab Exercises

1. Insert, Select, Update & Delete Commands

2. Nested Queries & Join Queries

 

3. Views

Unit-3
Teaching Hours:13
Data warehouse: the building blocks
 

Defining Features, Database and Data Warehouses, Architectural Types, Overview of  the Components, Metadata in the Data warehouse, The Star Schema, Star Schema Keys,  Advantages of the Star Schema, Star Schema: Examples, Snowflake Schema, Aggregate Fact Tables. 

 

  Lab Exercises

  1. Importing source data structures

  2. Design Target Data Structures

 

  3. Create target multidimensional cube

 

Unit-4
Teaching Hours:12
Data Integration and Data Flow (ETL)
 

Requirements, ETL Data Structures, Extracting, Cleaning and Conforming, Delivering Dimension Tables, Delivering Fact Tables, Real-Time ETL Systems

 

Lab Exercises

 

  1. Perform the ETL process and transform into data

map

  2. Create the cube and process it

  3. Generating Reports

  4. Creating the Pivot table and pivot chart using some existing data

 

Unit-5
Teaching Hours:12
NOSQL Databases
 

NOSQL Databases

Introduction to NOSQL Systems, The CAP Theorem, Document-Based NOSQL Systems and MongoDB, NOSQL Key-Value Stores, Column-Based or Wide Column NOSQL Systems, Graph databases, Multimedia databases.

Lab Exercises

1. MongoDB Exercise - 1

 

2. MongoDB Exercise - 2

Text Books And Reference Books:

[1]Henry F. Korth and Silberschatz Abraham, “Database System Concepts”, Mc.Graw Hill.

   [2] Thomas Cannolly and Carolyn Begg, “Database Systems, A Practical Approach to    

         Design, Implementation and Management”, Third Edition, Pearson Education, 2007.

   [3] The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling, 2nd John 

 

        Wiley & Sons, Inc. New York, USA, 2002

Essential Reading / Recommended Reading

[1] LiorRokach and OdedMaimon, Data Mining and Knowledge Discovery Handbook,   

 

Springer, 2nd edition, 2010.

Evaluation Pattern

 

EVALUATION PATTERN CIA 50%  ESE 50%

MDS272 - INFERENTIAL STATISTICS USING R (2023 Batch)

Total Teaching Hours for Semester:75
No of Lecture Hours/Week:7
Max Marks:100
Credits:4

Course Objectives/Course Description

 

Statistical inference plays an important role when analyzing data and making decisions based on real-world phenomena. This course aims to teach students to test hypotheses and estimate parameters for real life data sets.

 

Course Outcome

CO1: Demonstrate the concepts of population and samples

CO2: Apply the idea of sampling distribution of different statistics in testing of hypothesis

CO3: Estimate the unknown population parameters using the concepts of point and interval estimations using R.

CO4: Test the hypothesis using nonparametric tests for real world problems using R.

Unit-1
Teaching Hours:15
INTRODUCTION
 

Population and Statistics – Finite and Infinite population – Parameter and Statistics – Types of sampling - Sampling Distribution – Sampling Error - Standard Error – Test of significance –concept of hypothesis – types of hypothesis – Errors in hypothesis-testing – Critical region – level of significance - Power of the test – p-value.

Lab Exercises:

1. Calculation of sampling error and standard error

 

2. Calculation of probability of critical region using standard distributions

 

3. Calculation of power of the test using standard distributions.

 

 

Unit-2
Teaching Hours:15
Testing of Hypothesis I
 

 

Concept of large and small samples – Tests concerning a single population mean for known σ (and unknown σ) – equality of two means for known σ (and unknown σ) – Test for Single variance - Test for equality of two variance for normal population – Tests for single proportion – Tests of equality of two proportions for the normal population.

Lab Exercises:

1. Test of the single sample mean for known and unknown σ.

 

2. Test of equality of two means when known and unknown σ.

 

3. Tests of single variance and equality of variance for large samples.

 

4. Tests for single proportion and equality of two proportion for large samples.

 

 

 

Unit-3
Teaching Hours:15
Testing of Hypothesis II
 

 

Students t-distribution and its properties (without proofs) – Single sample mean test – Independent sample mean test – Paired sample mean test – Tests of proportion (based on t distribution) – F distribution and its properties (without proofs) – Tests of equality of two variances using F-test – Chi-square distribution and its properties (without proofs) – chisquare test for independence of attributes – Chi-square test for goodness of fit.

Lab Exercises:

1. Single sample mean test

2. Independent and Paired sample mean test

3. Tests of proportion of one and two samples based on t-distribution

 

 

Unit-4
Teaching Hours:15
Analysis of Variance
 

 

Meaning and assumptions - Fixed, random and mixed effect models - Analysis of variance of one-way and two-way classified data with and without interaction effects – Multiple comparison tests: Tukey’s method - critical difference.

1. Test of equality of two variances

2. Chi-square test for independence of attributes and goodness of fit.

3. Construction of one-way ANOVA

4. Construction of two-way ANOVA with interaction

5. Construction of two-way ANOVA without interaction

 

 

 

Unit-5
Teaching Hours:15
Nonparametric Tests
 

 

Concept of Nonparametric tests - Run test for randomness - Sign test and Wilcoxon Signed Rank Test for one and paired samples - Run test - Median test and Mann-Whitney-Wilcoxon tests for two samples.

Lab Exercises:

 

1. Multiple comparision test using Tukey’s method and critical difference methods

2. Test of one sample using Run and sign tests

3. Test of paried sample using Wilcoxon signed rank test

4. Test of two samples using Run test and Median test

Text Books And Reference Books:

1. Gupta S.C and Kapoor V.K, Fundamentals of Mathematical Statistics, 12th edition, Sultan Chand & Sons, New Delhi, 2020.

2. Brian Caffo, Statistical Inference for Data Science, Learnpub, 2016. 

Essential Reading / Recommended Reading

1. Walpole R.E, Myers R.H and Myers S.L, Probability and Statistics for Engineers and Scientists, 9th edition, Pearson, New Delhi, 2017.

2. Montgomery, D. C., & Runger, G. C. (2010). Applied statistics and probability for engineers. John wiley & sons.

3. Rajagopalan M and Dhanavanthan P, Statistical Inference, PHI Learning (P) Ltd, New Delhi, 2012.

4. Rohatgi V.K and Saleh E, An Introduction to Probability and Statistics, 3rd edition, JohnWiley & Sons Inc, New Jersey, 2015.

Evaluation Pattern

CIA - 50%

ESE - 50%

MDS273 - FULL STACK WEB DEVELOPMENT (2023 Batch)

Total Teaching Hours for Semester:75
No of Lecture Hours/Week:7
Max Marks:100
Credits:4

Course Objectives/Course Description

 

On completion of this course, a student will be familiar with full stack and able to develop a web application using advanced technologies and cultivate good web programming style and discipline by solving the real world scenarios.

Course Outcome

CO1: Apply JavaScript, HTML5, and CSS3 effectively to create interactive and dynamic websites.

CO2: Describe the main technologies and methods currently used in creating advanced web applications.

CO3: Design websites using appropriate security principles, focusing specifically on the vulnerabilities inherent in common web implementations.

CO4: Create modern web applications using MEAN.

Unit-1
Teaching Hours:15
OVERVIEW OF WEB TECHNOLOGIES AND HTML5
 

Internet and web Technologies- Client/Server model -Web Search Engine-Web Crawling-Web Indexing-Search Engine Optimization and Limitations-Web Services –Collective Intelligence –Mobile Web –Features of Web 3.0-HTML vs HTML5-Exploring Editors and Browsers Supported by HTML5-New Elements-HTML5 Semantics-Canvas-HTML Media

Lab Exercises

 

1. Develop static pages for a given scenario using HTML

2. Creating Web Animation with audio using HTML5 & CSS3

3. Demonstrate Geolocation and Canvas using HTML5

 

Unit-2
Teaching Hours:15
XML AND AJAX
 

XML-Documents and Vocabularies-Versions and Declaration -Namespaces JavaScript and XML: Ajax-DOM based XML processing Event-Transforming XML Documents-Selecting XML Data:XPATH-Template based Transformations: XSLT-Displaying XML Documents in Browsers - Evolution of AJAX -Web applications with AJAX -AJAX Framework

Lab Exercises

1. Write an XML file and validate the file using XSD

2.  Demonstrate XSL with XSD

3. Demonstrate DOM parser

 

 

 

Unit-3
Teaching Hours:15
CLIENT SIDE SCRIPTING
 

 

 JavaScript Implementation - Use Javascript to interact with some of the new HTML5 apis -Create and modify Javascript objects- JS Forms - Events and Event handling-JS Navigator-JS Cookies-Introduction to JSON-JSON vs XML-JSON Objects-Importance of Angular JS in web-Angular Expression and Directives-Single Page Application

Lab Exercises

1.Write a JavaScript program to demonstrate Form Validation and Event Handling

2.Create a web application using AngularJS with Forms

 

Unit-4
Teaching Hours:15
SERVER SIDE SCRIPTING
 

Introduction to Node.js-REPL Terminal-Package Manager(NPM)-Node.js Modules and filesystem-Node.js Events-Debugging Node JS Application-File System and streams-Testing Node JS with jasmine

Lab Exercises

1.Implement a single page web application using Angular JS CRUD Operation using AngularJS

2.Implement web application using AJAX with JSON

3.Demonstrate to fetch the information from an XML file with AJAX

 

 

Unit-5
Teaching Hours:15
NODE JS WITH MYSQL
 

 Introduction to MySQL- Performing basic database operation(DML) (Insert, Delete, Update, Select)-Prepared Statement- Uploading Image or File to MySQL- Retrieve Image or File from MySQL

Lab Exercises

1.Demonstrate Node.js file system module

2.Implement Mysql with Node.JS

3.Implement CRUD Operation using MongoDB

 

Text Books And Reference Books:

[1] Internet and World Wide Web:How to Program,  Paul Deitel , Harvey Deitel & Abbey Deitel, Pearson Education, 5th Edition, 2018.

[2] HTML 5 Black Book (Covers CSS3, JavaScript, XML, XHTML, AJAX, PHP, jQuery), DT Editorial Services, Dreamtech Press, 2nd Edition, 2016.

Essential Reading / Recommended Reading

[1] Chris Northwood, The Full Stack Developer: Your Essential Guide to the Everyday Skills Expected of a Modern Full Stack Web Developer, Apress Publications, 1st Edition, 2018.

[2] Laura Lemay, Rafe Colburn & Jennifer Kyrnin, Mastering HTML, CSS & Javascript Web Publishing, BPB Publications, 1st Edition, 2016.

[3] Alex Giamas, Mastering MongoDB 3.x, Packt Publishing Limited, First Edition, 2017.

 

Web Resources:

 

[1] www.w3cschools.com

[2] http://www.php.net/docs.php

 

Evaluation Pattern

CIA - 50%

ESE- 50%

MDS311 - PROGRAMMING FOR DATA SCIENCE IN R (2022 Batch)

Total Teaching Hours for Semester:30
No of Lecture Hours/Week:2
Max Marks:50
Credits:2

Course Objectives/Course Description

 

This lab is designed to introduce implementation of practical machine learning algorithms using R programming language. The lab will extensively use datasets from real life situations. 

Course Outcome

Unit-1
Teaching Hours:6
R INSTALLTION, SETUP AND LINEAR REGRESSION
 

Download and install R – R IDE environments – Why R – Getting started with R – Vectors and Data Frames – Loading Data Frames – Data analysis with summary statistics and scatter plots – Summary tables - Working with Script Files Linear Regression – Introduction – Regression model for one variable regression – Selecting best model – Error measures SSE, SST, RMSE, R2 – Interpreting R2 – Multiple linear regression – Lasso and ridge regression – Correlation – Recitation – A minimum of 3 data sets for practice

Unit-2
Teaching Hours:6
LOGISTIC REGRESSION
 

 Logistic Regression – The Logit – Confusion matrix – sensitivity, specificity – ROC curve – Threshold selection with ROC curve – Making predictions – Area under the ROC curve (AUC) - Recitation – A minimum of 3 data sets for practice

Unit-3
Teaching Hours:6
DECISION TREES
 

 Approaches to missing data – Data imputation – Multiple imputation – Classification and Regression Tress (CART) – CART with Cross Validation – Predictions from CART – ROC curve for CART – Random Forests – Building many trees – Parameter selection – K-fold Cross Validation – Recitation – A minimum of 3 data sets for practice

Unit-4
Teaching Hours:6
TEXT ANALYTICS AND NLP
 

 Using text as data – Text analytics – Natural language processing – Bag of words – Stemming – word clouds – Recitation – min 3 data sets for practice – Time series analysis – Clustering – k-mean clustering – Random forest with clustering – Understanding cluster patterns – Impact of clustering – Heatmaps – Recitation – min 3 data sets for practice

Unit-5
Teaching Hours:6
ENSEMBLE MODELLING
 

Support Vector Machines – Gradient Boosting – Naive Bayes - Bayesian GLM – GLMNET - Ensemble modeling – Experimenting with all of the above approaches (Units 1-5) with and without data imputation and assessing predictive accuracy – Recitation – min 3 data sets for practice PROJECT – A concluding project work carried out individually for a common data set

Text Books And Reference Books:

[1]. Statistics : An Introduction Using R, Michael J. Crawley, WILEY, Second Edition, 2015.

 

Essential Reading / Recommended Reading

 [1].Hands-on programming with R, Garrett Grolemund, O’Reilley, 1st Edition, 2014

 [2]. R for everyone, Jared Lander, Pearson, 1st Edition, 2014

Evaluation Pattern

CIA 50%

ESE 50%

MDS331 - NEURAL NETWORKS AND DEEP LEARNING (2022 Batch)

Total Teaching Hours for Semester:60
No of Lecture Hours/Week:4
Max Marks:100
Credits:4

Course Objectives/Course Description

 

The main aim of this course is to provide fundamental knowledge of neural networks and deep learning. On successful completion of the course, students will acquire fundamental knowledge of neural networks and deep learning, such as Basics of neural networks, shallow neural networks, deep neural networks, forward & backward propagation process and build various research projects.

Course Outcome

CO1: Understand the major technology trends in neural networks and deep learning.

CO2: Build, train and apply neural networks and fully connected deep neural networks

CO3: Implement efficient (vectorized) neural networks for real time application.

Unit-1
Teaching Hours:12
INTRODUCTION TO ARTIFICIAL NEURAL NETWORKS
 

 Neural Networks-Application Scope of Neural Networks- Fundamental Concept of ANN: The Artificial Neural Network-Biological Neural Network-Comparison between Biological Neuron and Artificial Neuron-Evolution of Neural Network. Basic models of ANN-Learning Methods-Activation Functions-Importance Terminologies of ANN.

Unit-2
Teaching Hours:12
SUPERVISED LEARNING NETWORK
 

 Shallow neural networks- Perceptron Networks-Theory-Perceptron Learning RuleArchitecture-Flowchart for training Process-Perceptron Training Algorithm for Single and Multiple Output Classes. Back Propagation Network- Theory-Architecture-Flowchart for training process-Training Algorithm-Learning Factors for Back-Propagation Network. Radial Basis Function Network RBFN: Theory, Architecture, Flowchart and Algorithm.

Unit-3
Teaching Hours:12
CONVOLUTIONAL NEURAL NETWORK
 

 Introduction - Components of CNN Architecture - Rectified Linear Unit (ReLU) Layer - Exponential Linear Unit (ELU, or SELU) - Unique Properties of CNN -Architectures of CNN -Applications of CNN.

Unit-4
Teaching Hours:12
RECURRENT NEURAL NETWORK
 

 Introduction- The Architecture of Recurrent Neural Network- The Challenges of Training Recurrent Networks- Echo-State Networks- Long Short-Term Memory (LSTM) - Applications of RNN.

Unit-5
Teaching Hours:12
AUTO ENCODER AND RESTRICTED BOLTZMANN MACHINE
 

Introduction - Features of Auto encoder Types of Autoencoder Restricted Boltzmann MachineBoltzmann Machine - RBM Architecture -Example - Types of RBM. 

Text Books And Reference Books:

1. S.N.Sivanandam, S. N. Deepa, Principles of Soft Computing, Wiley-India, 3rd Edition, 2018.

2. Dr. S Lovelyn Rose, Dr. L Ashok Kumar, Dr. D Karthika Renuka, Deep Learning Using Python, Wiley-India, 1st Edition, 2019.

Essential Reading / Recommended Reading

1. Charu C. Aggarwal, Neural Networks and Deep Learning, Springer, September 2018.

2. Francois Chollet, Deep Learning with Python, Manning Publications; 1st edition, 2017

3. John D. Kelleher, Deep Learning (MIT Press Essential Knowledge series), The MIT Press, 2019.

Evaluation Pattern

CIA 50%

ESE 50%

MDS341A - TIME SERIES ANALYSIS AND FORECASTING TECHNIQUES (2022 Batch)

Total Teaching Hours for Semester:60
No of Lecture Hours/Week:4
Max Marks:100
Credits:4

Course Objectives/Course Description

 

This course covers applied statistical methods pertaining to time series and forecasting

techniques. Moving average models like simple, weighted and exponential are dealt

with. Stationary time series models and non-stationary time series models like AR,

MA, ARMA and ARIMA are introduced to analyse time series data.

Course Outcome

Unit-1
Teaching Hours:12
UNIT 1
 

INTRODUCTION TO TIME SERIES AND STOCHASTIC PROCESS Introduction to

time series and stochastic process, graphical representation, components and classical

decomposition of time series data.Auto-covariance and auto-correlation functions,

Exploratory time series analysis, Test for trend and seasonality, Smoothing techniques

such as Exponential and moving average smoothing, Holt- Winter smoothing, Forecasting

based on smoothing.

Unit-2
Teaching Hours:12
Unit 2
 

STATIONARY TIME SERIES MODELS

Wold representation of linear stationary processes, Study of linear time series models:

Autoregressive, Moving Average and Autoregressive Moving average models and their

statistical properties like ACF and PACF function.

Unit-3
Teaching Hours:12
Unit 3
 

ESTIMATION OF ARMA MODELS

Estimation of ARMA models: Yule- Walker estimation of AR Processes, Maximum

likelihood and least squares estimation for ARMA Processes, Residual analysis and

diagnostic checking.

Unit-4
Teaching Hours:12
Unit 4
 

NON-STATIONARY TIME SERIES MODELS

Concept of non-stationarity, general unit root tests for testing non-stationarity; basic

formulation of the ARIMA Model and their statistical properties-ACF and PACF;

Unit-5
Teaching Hours:12
Unit 5
 

STATE SPACE MODELS

Filtering, smoothing and forecasting using state space models, Kalman smoother,

Maximum likelihood estimation, Missing data modifications

Text Books And Reference Books:

1. George E. P. Box, G.M. Jenkins, G.C. Reinsel and G. M. Ljung, Time Series analysis Forecasting and Control, 5th Edition, John Wiley & Sons, Inc., New Jersey,

2016.

2. Montgomery D.C, Jennigs C. L and Kulachi M,Introduction to Time Series analysis and Forecasting, 2nd Edition,John Wiley & Sons, Inc., New Jersey, 2016.

Essential Reading / Recommended Reading

1. Anderson T.W,Statistical Analysis of Time Series, John Wiley& Sons, Inc., New Jersey, 1971.

2. Shumway R.H and Stoffer D.S, Time Series Analysis and its Applications with R Examples, Springer, 2011.

3. P. J. Brockwell and R. A. Davis, Times series: Theory and Methods, 2nd Edition, Springer-Verlag, 2009.

4. S.C. Gupta and V.K. Kapoor, Fundamentals of Applied Statistics, 4th Edition,

Sultan Chand and Sons, 2008.

Evaluation Pattern

CIA: 50%

ESE: 50%

MDS341B - BAYESIAN INFERENCE (2022 Batch)

Total Teaching Hours for Semester:60
No of Lecture Hours/Week:4
Max Marks:100
Credits:4

Course Objectives/Course Description

 

To equip the students with the knowledge of conceptual, computational, and practical methods of Bayesian data analysis. 

Course Outcome

Unit-1
Teaching Hours:12
INTRODUCTION
 

Basics on minimaxity: subjective and frequents probability, Bayesian inference, Bayesian estimation , prior distributions, posterior distribution, loss function, principle of minimum expected posterior loss, quadratic and other common loss functions, Advantages of being a Bayesian HPD confidence intervals, testing, credible intervals, prediction of a future observation.

Unit-2
Teaching Hours:12
BAYESIAN ANALYSIS WITH PRIOR INFORMATION
 

 Robustness and sensitivity, classes of priors, conjugate class, neighbourhood class, density ratio class different methods of objective priors: Jeffrey’s prior, probability matching prior, conjugate priors and mixtures, posterior robustness: measures and techniques

Unit-3
Teaching Hours:12
MULTIPARAMETER AND MULTIVARIABLE MODELS
 

 Basics of decision theory, multi-parameter models, Multivariate models, linear regression, asymptotic approximation to posterior distributions.

Unit-4
Teaching Hours:12
MODEL SELECTION AND HYPOTHESIS TESTING
 

 Selection criteria and testing of hypothesis based on objective probabilities and Bayes’ factors, large sample methods: limit of posterior distribution, consistency of posterior distribution, asymptotic normality of posterior distribution.

Unit-5
Teaching Hours:12
BAYESIAN COMPUTATIONS
 

 Analytic approximation, E- M Algorithm, Monte Carlo sampling, Markov Chain Monte Carlo Methods, Metropolis – Hastings Algorithm, Gibbs sampling, examples, convergence issues

Text Books And Reference Books:

1. Albert Jim (2009) Bayesian Computation with R, second edition, Springer, New York

2. Bolstad W. M. and Curran, J.M. (2016) Introduction to Bayesian Statistics 3rd Ed. Wiley, New York

3. Christensen R. Johnson, W. Branscum A. and Hanson T.E. (2011) Bayesian Ideas and data analysis : A introduction for scientist and Statisticians, Chapman and Hall, LondonA. Gelman, J.B. Carlin, H.S. Stern and D.B. Rubin (2004). Bayesian Data Analysis,2nd Ed. Chapman & Hall

Essential Reading / Recommended Reading

1. Congdon P. (2006) Bayesian Statistical Modeling, Wiley, New York.

2. Ghosh, J.K. Delampady M. and T. Samantha (2006). An Introduction to Bayesian Analysis: Theory and Methods, Springer, New York.

3. Lee P.M. (2012) Bayesian Statistics: An Introduction-4th Ed. Hodder Arnold, New York.

4. Rao C.R. Day D. (2006) Bayesian Thinking, Modeling and Computation, Handbook of Statistics, Vol.25.

Evaluation Pattern

CIA 50%

ESE 50%

MDS341C - ECONOMETRICS (2022 Batch)

Total Teaching Hours for Semester:60
No of Lecture Hours/Week:4
Max Marks:100
Credits:4

Course Objectives/Course Description

 

The course is designed to impart the learning of principles of econometric methods and tools. This is expected to improve student’s ability to understand of econometrics in the study of economics and finance. The learning objective of the course is to provide students to get the basic knowledge and skills of econometric analysis, so that they should be able to apply it to the investigation of economic relationships and processes, and also understand the econometric methods, approaches, ideas, results and conclusions met in the majority of economic books and articles. Introduce the students to the traditional econometric methods developed mostly for the work with cross sections data. 

Course Outcome

Unit-1
Teaching Hours:15
INTRODUCTION
 

Introduction to Econometrics- Meaning and Scope – Methodology of Econometrics – Nature and Sources of Data for Econometric analysis – Types of Econometrics

Unit-2
Teaching Hours:15
CORRELATION
 

 Aitken’s Generalised Least Squares(GLS) Estimator, Heteroscedasticity, Auto-correlation, Multicollinearity, Auto-Correlation, Test of Auto-correlation, Multicollinearity, Tools for Handling Multicollinearity

Unit-3
Teaching Hours:15
REGRESSION
 

Linear Regression with Stochastic Regressors, Errors in Variable Models and Instrumental Variable Estimation, Independent Stochastic linear Regression, Auto regression, Linear regression, Lag Models

Unit-4
Teaching Hours:15
LINEAR EQUATIONS MODEL
 

 Simultaneous Linear Equations Model : Structure of Linear Equations Model, Identification Problem, Rank and Order Conditions, Single Equation and Simultaneous Equations, Methods of Estimation- Indirect Least squares, Least Variance Ratio and Two Stage Least Square

Text Books And Reference Books:

1. Johnston, J. (1997). Econometric Methods, Fourth Edition, McGraw Hill

2. Gujarathi, D., and Porter, D. (2008). Basic Econometrics, Fifth Edition, McGraw Hill

Essential Reading / Recommended Reading

1. Intriligator, M. D. (1980). Econometric Models-Techniques and Applications, Prentice Hall.

2. Theil, H. (1971). Principles of Econometrics, John Wiley.

3. Walters, A. (1970). An Introduction to Econometrics, McMillan and Co.

Evaluation Pattern

CIA 50%

ESE 50%

MDS341D - BIO-STATISTICS (2022 Batch)

Total Teaching Hours for Semester:60
No of Lecture Hours/Week:4
Max Marks:100
Credits:4

Course Objectives/Course Description

 

This course provides an understanding of various statistical methods in describing and analyzing biological data. Students will be equipped with an idea about the applications of statistical hypothesis testing, related concepts and interpretation in biological data.

Course Outcome

Unit-1
Teaching Hours:12
INTRODUCTION TO BIOSTATISTICS
 

 Presentation of data - graphical and numerical representations of data - Types of variables, measures of location - dispersion and correlation - inferential statistics - probability and distributions - Binomial, Poisson, Negative Binomial, Hyper geometric and normal distribution. 

Unit-2
Teaching Hours:12
PARAMETRIC AND NON - PARAMETRIC METHODS
 

Parametric methods - one sample t-test - independent sample t-test - paired sample t-test - one-way analysis of variance - two-way analysis of variance - analysis of covariance - repeated measures of analysis of variance - Pearson correlation coefficient - Non parametric methods: Chi-square test of independence and goodness of fit - Mann Whitney U test - Wilcoxon signed-rank test - Kruskal Wallis test - Friedman’s test - Spearman’s correlation test

Unit-3
Teaching Hours:12
GENERALIZED LINEAR MODELS
 

 Review of simple and multiple linear regression - introduction to generalized linear models - parameter estimation of generalized linear models - models with different link functions - binary (logistic) regression - estimation and model fitting - Poisson regression for count data - mixed effect models and hierarchical models with practical examples.

Unit-4
Teaching Hours:12
EPIDEMIOLOGY
 

 Introduction to epidemiology, measures of epidemiology, observational study designs: case report, case series correlational studies, cross-sectional studies, retrospective and prospective studies, analytical epidemiological studies-case control study and cohort study, odds ratio, relative risk, the bias in epidemiological studies. 

Unit-5
Teaching Hours:12
DEMOGRAPHY
 

 Introduction to demography, mortality and life tables, infant mortality rate, standardized death rates, life tables, fertility, crude and specific rates, migration-definition and concepts population growth, measurement of population growth-arithmetic, geometric and exponential, population projection and estimation, different methods of population projection, logistic curve, urban population growth, components of urban population growth.

Text Books And Reference Books:

1. Marcello Pagano and Kimberlee Gauvreau (2018), Principles of Biostatistics, 2nd Edition, Chapman and Hall/CRC press

2. David Moore S. and George McCabe P., (2017) Introduction to practice of statistics, 9th Edition, W. H. Freeman.

3. Sundar Rao and Richard J., (2012) Introduction to Biostatistics and research methods, PHI Learning Private limited, New Delhi 

Essential Reading / Recommended Reading

1. Abhaya Indrayan and Rajeev Kumar M., (2018) Medical Biostatistics, 4th Edition, Chapman and Hall/CRC Press.

2. Gordis Leon (2018), Epidemiology, 6th Edition, Elsevier, Philadelphia

3. Ram, F. and Pathak K. B., (2016): Techniques of Demographic Analysis, Himalaya Publishing house, Bombay

. 4. Park K., (2019), Park's Text Book of Preventive and Social Medicine, Banarsidas Bhanot, Jabalpur. 

Evaluation Pattern

CIA 50%

ESE 50%

MDS342C - STOCHASTIC PROCESSES (2023 Batch)

Total Teaching Hours for Semester:45
No of Lecture Hours/Week:4
Max Marks:100
Credits:3

Course Objectives/Course Description

 

 

This course is designed to introduce the concepts of theory of estimation and testing of hypothesis. This paper also deals with the concept of parametric tests for large and small samples. It also provides knowledge about non-parametric tests and its applications.

Course Outcome

CO1: Understand and apply the types of stochastic processes in various real-life scenarios.

CO2: Demonstrate a discrete space stochastic process in discrete index and estimate the evolving time in a state.

CO3: Apply probability arguments to model and estimate the counts in continuous time

CO4: Evaluate the extinction probabilities of a generation.

CO5: Development of renewal equations in discrete and continuous time.

CO6: Understand the stationary process and application in Time Series Modelling

Unit-1
Teaching Hours:9
INTRODUCTION TO STOCHASTIC PROCESSES
 

Classification of Stochastic Processes, Markov Processes – Markov Chain - Countable State Markov Chain. Transition Probabilities, Chapman - Kolmogorov's Equations, Calculation of n - step Transition Probability and its limit.

 

Unit-2
Teaching Hours:9
POISSON PROCESS
 

 

Classification of States, Recurrent and Transient States - Transient Markov Chain, Random Walk. Continuous Time Markov Process: Poisson Processes, Birth and Death Processes, Kolmogorov’s Differential Equations, Applications.

Unit-3
Teaching Hours:9
BRANCHING PROCESS
 

Branching Processes – Galton – Watson Branching Process - Properties of Generating Functions – Extinction Probabilities – Distribution of Total Number of Progeny.

 

 

Unit-4
Teaching Hours:9
RENEWAL PROCESS
 

Renewal Processes – Renewal Process in Discrete and Continuous Time – Renewal Interval – Renewal Function and Renewal Density – Renewal Equation – Renewal theorems: Elementary Renewal Theorem. 

Unit-5
Teaching Hours:9
STATIONARY PROCESS
 

 

Stationary Processes: Application to Time Series. Auto-covariance and Auto-correlation functions and their properties. Moving Average, Autoregressive, Autoregressive Moving Average. Basic ideas of residual analysis, diagnostic checking, forecasting.

Text Books And Reference Books:
  1. Stochastic Processes, R.G Gallager, Cambridge University Press, 2013.
  2. Stochastic Processes, S.M Ross, Wiley India Pvt. Ltd, 2008.
Essential Reading / Recommended Reading
  1. Stochastic Processes from Applications to Theory, P.D Moral and S. Penev, CRC Press, 2016. 

2. Introduction to Probability and Stochastic Processes with Applications, B..C. Liliana, A Viswanathan, S. Dharmaraja, Wiley Pvt. Ltd, 2012.

Evaluation Pattern

CIA 50%  ESE 50%

MDS371 - CLOUD ANALYTICS (2022 Batch)

Total Teaching Hours for Semester:90
No of Lecture Hours/Week:6
Max Marks:150
Credits:5

Course Objectives/Course Description

 

The objective of this course is to explore the basics of cloud analytics and the major

cloud solutions. Students will learn how to analyze extremely large data sets, and to

create visual representations of that data. Also aim to provide students with hands-on

experience working with data at scale.

Course Outcome

CO1: Interpret the deployment and service models of cloud applications. CO2: Describe big data analytical concepts.

CO2: Ingest, store, and secure data.

CO3: Process and Visualize structured and unstructured data.

Unit-1
Teaching Hours:18
INTRODUCTION
 

INTRODUCTION

Introduction to cloud computing - Major benefits of cloud computing - Cloud computing

deployment models - Private cloud - Public cloud - Hybrid cloud - Types of cloud

computing services -Infrastructure as a Service – PaaS – SaaS - Emerging cloud

technologies and services - Different ways to secure the cloud - Risks and challenges with

the cloud - What is cloud analytics? Parameters before adopting cloud strategy -

Technologies utilized by cloud computing

1.Creating Virtual Machines using Hypervisors

2.IaaS: Compute service - Creating and running Virtual Machines

Unit-2
Teaching Hours:18
CLOUD ENABLING TECHNOLOGIES
 

Virtualization - Load Balancing - Scalability & Elasticity – Deployment –Replication –

Monitoring - Software Defined Networking - Network Function Virtualization –

MapReduce - Identity and Access Management - Service Level Agreements - Billing 1.

Storage as a Service: Ingesting & Querying data into cloud

2. Database as a Service: Building DB Server

Unit-3
Teaching Hours:18
BASIC CLOUD SERVICES & PLATFORMS
 

Compute Services

Amazon Elastic Compute Cloud - Google Compute Engine

- Windows Azure Virtual Machines

Storage Services

Amazon Simple Storage Service - Google Cloud Storage - Windows Azure Storage

Database Services

Amazon Relational Data Store - Amazon DynamoDB - Google Cloud SQL - Google

Cloud Datastore - Windows Azure SQL Database - Windows Azure Table Service 1.

PaaS: Working with GoogleAppEngine

Unit-4
Teaching Hours:18
DATA INGESTION AND STORING
 

Cloud Dataflow - The Dataflow programming model - Cloud Pub/Sub - Cloud storage -

Cloud SQL - Cloud BigTable - Cloud Spanner - Cloud Datastore - Persistent disks 1.

Database as a Service: Building DB Server

2. Transforming data

PROCESSING AND VISUALIZING

Google BigQuery - Cloud Dataproc - Google Cloud Datalab - Google Data

Studio 1. Visualize structured data and unstructureddata

Unit-5
Teaching Hours:18
MACHINE LEARNING, DEEP LEARNING AND AI
 

Services on Artificial intelligence - Machine learning - Cloud Natural Language API –

TensorFlow - Cloud Speech API - Cloud Translation API - Cloud Vision API - Cloud

Video Intelligence – Dialogflow – AutoML

1. Load and query data in a data warehouse

2. Setting up and executing a data pipeline job to load data into cloud

Text Books And Reference Books:

1. Sanket Thodge, Cloud Analytics with Google Cloud Platform, Packt Publishing, 20

18.

2. Arshdeep Bahga and Vijay Madisetti, Cloud computing - A Hands-On Approach,

Create Space Independent Publishing Platform, 2014.

Essential Reading / Recommended Reading

1. Deven Shah, Kailash Jayaswal, Donald J. Houde, Jagannath Kallakurchi, Cloud

Computing - Black Book, Wiley, 2014.

2. Thomas Erl, Ricardo Puttini, Zaigham Mahmood, Cloud Computing: Concepts,

Technology & Architecture, Prentice Hall, 2014.

Evaluation Pattern

CIA 50%

ESE 50%

MDS372 - BUSINESS INTELLIGENCE (2022 Batch)

Total Teaching Hours for Semester:75
No of Lecture Hours/Week:5
Max Marks:4
Credits:4

Course Objectives/Course Description

 

This course is designed to introduce students the concepts of business intelligence andalso provide students with an understanding of data warehousing and data mining along with associated tools and techniques and their benefits to organizations of all sizes.

Course Outcome

CO1: Understand the fundamentals of Business Intelligence and Analytics

CO2: Apply the concepts of data warehouse concepts required for Business Intelligence

CO3: Build a performance dashboard using data visualization and visual analytics.

CO4: Implement the business intelligence perspective of data mining and text mining

Unit-1
Teaching Hours:15
Overview of Business Intelligence,
 

An Overview of Business Intelligence, Analytics, andDecision Support: ChangingBusinessEnvironmentsandComputerizedDecisionSupport- AFrameworkforBusinessIntelligence (BI) - Transaction Processing VERSUS Analytic Processing - Successful BIImplementation - Business Analytics Overview: Descriptive Analytics - PredictiveAnalytics - Prescriptive Analytics - BriefIntroduction to Big Data Analytics ApplicationsofBI.

LabExercises:

1.CasestudyonTransactionProcessing.2.Case StudyonPredictive Analytics

 

Unit-2
Teaching Hours:15
Business Intelligence Tools and Applications
 

AdhocAnalysis-OnlineAnalyticalProcessing-MobileBI-Real-timeBI-OperationIntelligence

Open-SourceBI-EmbeddedBI-CollaborativeBI-LocationIntelligence-Businessintelligence

vendorsandmarket

 

LabExercises:

1.ExerciseonOLAPinBI.

2.   ExerciseonRealtimeBI.

Unit-3
Teaching Hours:15
Power BI
 

Power BI Overview-Installation-Data Sources-Query Editor-Importing Files-DataModeling Lookup Data Tables-Active vs. Inactive Relationships-Roles-RefreshingDataandHierarchies-DataModeling-DAX-CalculatedColumns-Measures-DesignandInteractiveReports-Dashboard

 

LabExercises:

1.ExerciseonDatamodelinginPowerBI

2.   ExerciseonDashboards&ReportsinPowerBI.

Unit-4
Teaching Hours:15
Tableau Basics
 

Tableau Overview-Data Sources, First Bar Chart Graph-the Extracted Data- Knowledgeof Aggregation,Granularity,andTime-Series-WorkingwithChartsandFilter-Overviewof First Dashboard, Maps, and Scatter Plots-Joins and Relationship- Data Joining- MapCreation.

 

LabExercises:

1.Exercise on Extracted Data in Tableau.2.ExerciseonJoins,Maps,PlotsinTableau.

Unit-5
Teaching Hours:15
Working with Tableau
 

First Dashboard Creation with Highlighting and Filters-Overview of Dual-axis Chart,Joining, Relationship, and Blending-Joining with Different Conditions, i.e., MultipleFields and Duplicate Values-Working on Blending Data-Creation of Dual Axis ChartUnderstandingofCalculatedFields-Understanding ofRelationshipData-OverviewofNew Dashboard-Updated Way of Data Preparation-Overview of New Design FeatureandManyMore-AdvancementinTableau

 

LabExercises:

1.ExerciseonDashboardCreationandBlendinginTableau.

2.   ExerciseonDatapreparation,DatarelationshipandfieldsinTableau.

Text Books And Reference Books:

1.  ChandraishSinha(2022).”MasteringPowerBI”,1stEdition,BPBPublications.

2.  Marleen,David,”MasteringTableau2021:ImplementadvancedbusinessintelligencetechniquesandanalyticswithTableau”,3rdEdition,Pakt,

3.  RameshSharda,Dursun,Delen,EfraimTurban(2017).“BusinessIntelligence:ManegerialPerspectiveonAnalytics”,3rdEdition,PearsonPublication.

Essential Reading / Recommended Reading

1.AhmedSherif(2016).”PracticalBusinessIntelligence”,PacktPublishing.

Evaluation Pattern

CIA 50%

ESE 50%

MDS373A - NATURAL LANGUAGE PROCESSING (2022 Batch)

Total Teaching Hours for Semester:90
No of Lecture Hours/Week:6
Max Marks:150
Credits:5

Course Objectives/Course Description

 

The goal is to make familiar with the concepts of the study of human language from a computational perspective. It covers syntactic, semantic and discourse processing models, emphasizing machine learning concepts.

Course Outcome

Unit-1
Teaching Hours:15
INTRODUCTION
 

 Introduction to NLP- Background and overview- NLP Applications -NLP hard Ambiguity Algorithms and models, Knowledge Bottlenecks in NLP- Introduction to NLTK, Case study

Lab:

1. Write a program to tokenize text

2. Write a program to count word frequency and to remove stop words

 

Unit-2
Teaching Hours:15
PARSING AND SYNTAX
 

 Word Level Analysis: Regular Expressions, Text Normalization, Edit Distance, Parsing and Syntax- Spelling, Error Detection and correction-Words and Word classes- Part-of Speech Tagging, Naive Bayes and Sentiment Classification: Case study.

Lab:

3. Write a program to program to tokenize Non-English Languages

4. Write a program to get synonyms from WordNet

 

 

Unit-3
Teaching Hours:15
SMOOTHED ESTIMATION AND LANGUAGE MODELLING
 

 N-gram Language Models: N-Grams, Evaluating Language Models -The language modelling problem SEMANTIC ANALYSIS AND DISCOURSE PROCESSING Semantic Analysis: Meaning Representation-Lexical Semantics- Ambiguity-Word Sense Disambiguation. Discourse Processing: cohesion-Reference Resolution- Discourse Coherence and Structure.

Lab:

 

5. Write a program to get Antonyms from WordNet

6. Write a program for stemming Non-English words 

 

 

Unit-4
Teaching Hours:15
NATURALLANGUAGE GENERATION AND MACHINE TRANSLATION
 

 Natural Language Generation: Architecture of NLG Systems, Applications Machine Translation: Problems in Machine Translation- Machine Translation Approaches Evaluation of Machine Translation systems. Case study: Characteristics of Indian Language

Lab:

 

7. Write a program for lemmatizing words Using WordNet 

8. Write a program to differentiate stemming and lemmatizing words 

 

 

Unit-5
Teaching Hours:15
INFORMATION RETRIEVAL AND LEXICAL RESOURCES
 

 Information Retrieval: Design features of Information Retrieval Systems-Classical, Non classical, Alternative Models of Information Retrieval – valuation Lexical Resources: Word Embeddings - Word2vec- Glove. UNSUPERVISED METHODS IN NLP Graphical Models for Sequence Labelling in NLP.

Lab:

 

9. Write a program for POS Tagging or Word Embeddings. 

10. Case study-based program (IBM) or Sentiment analysis

 

Text Books And Reference Books:

1. Speech and Language Processing, Daniel Jurafsky and James H., 2nd Edition, Martin Prentice Hall,2013.

2. Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press, 1999

Essential Reading / Recommended Reading

1. Foundations of Computational Linguistics: Human-computer Communication in Natural Language, Roland R. Hausser, Springer, 2014.

2. Steven Bird, Ewan Klein and Edward Loper Natural Language Processing with Python, O’Reilly Media; 1 edition, 2009.

Evaluation Pattern

CIA 50%

ESE 50%

MDS373B - HADOOP (2022 Batch)

Total Teaching Hours for Semester:90
No of Lecture Hours/Week:6
Max Marks:150
Credits:5

Course Objectives/Course Description

 

ThesubjectisintendedtogivetheknowledgeofBigDataevolving ineveryreal-timeapplications and how they are manipulated using the emerging technologies. Thiscourse breaks down the walls of complexity in processing Big Data by providing apractical approach to developing Java applications on top of the Hadoop platform. ItdescribestheHadooparchitectureandhowtoworkwiththeHadoopDistributedFileSystem(HDFS)andHBaseinUbuntuplatform.

Course Outcome

CO1: Understand the Big Data concepts in real time scenario

CO2: Understand the big data systems and identify the main sources of Big Data in the real world.

CO3: Demonstrate an ability to use Hadoop framework for processing Big Data for Analytics.

CO4: Evaluate the Map reduce approach for different domain problems.

Unit-1
Teaching Hours:15
INTRODUCTION
 

 

Distributed file system – Big Data and its importance, Four Vs, Drivers for Big data, Big dataanalytics,Bigdataapplications,Algorithmsusingmapreduce,Matrix-VectorMultiplicationbyMapReduce.

 

Apache Hadoop– Moving Data in and out of Hadoop – Understanding inputs and outputsofMapReduce-DataSerialization,Problemswithtraditionallarge-scalesystems-Requirementsfor a new approach-Hadoop – Scaling-Distributed Framework-Hadoop v/s RDBMS-BriefhistoryofHadoop.

 

LabExercise

1.InstallingandConfiguringHadoop

Unit-2
Teaching Hours:15
CONFIGURATIONS OF HADOOP
 

 

HadoopProcesses(NN,SNN,JT,DN,TT)-Temporarydirectory–UI-Commonerrorswhenrunning Hadoop cluster, solutions.Setting up Hadoop on a local Ubuntu host: Prerequisites,downloading Hadoop, setting up SSH, configuring the pseudo-distributed mode, HDFSdirectory, NameNode, Examples of MapReduce, Using Elastic MapReduce, Comparison oflocalversus EMR Hadoop.

UnderstandingMapReduce:Key/valuepairs,TheHadoopJavaAPIforMapReduce,WritingMapReduce programs, Hadoop-specific data types, Input/output.Developing MapReducePrograms: UsinglanguagesotherthanJavawithHadoop, Analysing alarge dataset.

 

LabExercise

 

1.  1.WordcountapplicationinHadoop.

 

2.  2.SortingthedatausingMapReduce.

 

3.  3.Findingmaxand minvalueinHadoop.

Unit-3
Teaching Hours:15
ADVANCED MAPREDUCE TECHNIQUES
 

 

Simple,advanced,and in-betweenJoins,Graphalgorithms,usinglanguage-independentdatastructures.Hadoop configuration properties - Setting up a cluster, Cluster access control,managingtheNameNode,ManagingHDFS,MapReducemanagement,Scaling.

 

LabExercise:

 

1.  ImplementationofdecisiontreealgorithmsusingMapReduce.

 

2.  ImplementationofK-meansClusteringusingMapReduce.

 

3.  GenerationofFrequentItemsetusingMapReduce.

Unit-4
Teaching Hours:15
HADOOP STREAMING
 

HadoopStreaming-StreamingCommandOptions -

SpecifyingaJavaClassastheMapper/Reducer-Packaging FilesWithJobSubmissions-SpecifyingOtherPlug-insforJobs.

 

LabExercise:

1.  Countthenumberofmissingandinvalid valuesthroughjoiningtwo largegivendatasets.

2.  Usinghadoop’smap-reduce,EvaluatingNumberofProductsSoldinEachCountryintheonlineshoppingportal.Datasetis given.

3.  Analyzethesentimentforproductreviews,thisworkproposesaMapReducetechniqueprovidedbyApache Hadoop.

Unit-5
Teaching Hours:15
HIVE & PIG
 

 

Architecture,Installation,Configuration,HivevsRDBMS,Tables,DDL&DML,Partitioning& Bucketing, Hive Web Interface, Pig, Use case of Pig, Pig Components, Data Model, PigLatin.

LabExercise

 

1.TrendAnalysisbasedonAccessPatternoverWebLogsusingHadoop.2.ServiceRatingPredictionbyExploring SocialMobile UsersGeographicalLocations.

Unit-6
Teaching Hours:15
HBase
 

 

RDBMSVsNoSQL,HBasics,Installation,Buildinganonlinequeryapplication–Schemadesign,LoadingData,OnlineQueries,Successfulservice.

 

HandsOn: SingleNodeHadoopClusterSetupinanycloud serviceprovider- Howtocreateinstance.How to connect that Instance Using putty.Installing Hadoop framework on thisinstance.Runsampleprogramswhich comewithHadoopframework.

Text Books And Reference Books:

1]Borislublinsky,Kevint.Smith,AlexeyYakubovich,ProfessionalHadoopSolutions,Wiley,2015.

 

[2]   TomWhite,Hadoop:TheDefinitiveGuide,O’ReillyMediaInc.,2015.

 

 

Essential Reading / Recommended Reading

[1]   Garry Turkington, Hadoop Beginner's Guide, Packt Publishing, 2013.

 

Evaluation Pattern

CIA 50%

ESE 50%

MDS373C - BIO INFORMATICS (2022 Batch)

Total Teaching Hours for Semester:90
No of Lecture Hours/Week:6
Max Marks:150
Credits:5

Course Objectives/Course Description

 

To enable the students to learn the information search and retrieval,Genome analysis and Genemapping, alignment of multiplesequences and PERL for Bioinformatics.

Course Outcome

CO1: Understand the molecular Biology and Bioinformatics applications.

CO2: Apply the modeling and simulation technologies in Biology and medicine.

CO3: Evaluate the algorithms to find the similarity between protein and DNA sequences.

Unit-1
Teaching Hours:18
BIOINFORMATICS
 

Introduction, Historical Overview and Definition, Applications, Major databases inBioinformatics,Datamanagement and Analysis,CentralDogma of MolecularBiology.INFORMATION SEARCH AND RETRIEVAL

Introduction,Toolsforwebsearch,Dataretrievaltools,DataminingofBiologicaldatabases.

LabExercise

1.  Test and verify thebasic Linux commands and Filters.

2.  Create the file(s) and verify the file handling commands.

Unit-2
Teaching Hours:18
GENOME ANALYSIS AND GENE MAPPING
 

 

GENOME ANALYSIS AND GENE MAPPING Introduction, Genome analysis, Genomemapping, Sequence assembly problem, Genetic mapping and linkage analysis, Physicalmaps, Cloning the entire Genome, Genome sequencing, Applications of Genetic maps,Identification of Genes in Contigs, Human Genome Project. ALIGNMENT OF PAIRS OFSEQUENCES Introduction,Biological motivation of alignment,Methods of sequencealignments,Usingscore matrices,Measuringsequence detection

LabExercise

1.  Create directories and verify the directory commands.

2.  Perform basic mathematical operations using PERL.

        3. Write a PERL script to demonstrate the Array operations and Regular expressions.

Unit-3
Teaching Hours:18
ALIGNMENT OF MULTIPLE SEQUENCES
 

 ALIGNMENT OF MULTIPLE SEQUENCES Methods of multiple sequence alignment,Evaluating multiple alignments,Applications of multiple alignments,Phylogenetic analysis, Methods of phylogenetic analysis, Tree evaluation, Problems in Phylogeneticanalysis.

TOOLS FOR SIMILARITY SEARCH AND SEQUENC EALIGNMENTIntroduction,

Working with FASTA, Working with BLAST, Filtering and Gapped BLAST, FASTA and BLAST algorithm comparison.

LabExercise:

1.  Write a PERL script to concatenate DNA sequences.

2.  Write a PERL script to transcribe DNA sequence into RNA sequence

3.Write a PERL script to calculate the reverse complement of fast rand of DNA.

Unit-4
Teaching Hours:18
PERL FOR BIOINFORMATICS
 

Sequences and Strings: Representing sequence data, Program to store a DNA sequence,Concatenating DNA fragments, Transcription DNA to RNA, Proteins, Files and Arrays,ReadingProteinsinFiles,Arrays,ScalarandListContext.

 

Motifs and Loops: Flow control,Code layout, Findingmotifs, Counting Nucleotides,Explodingstrings and arrays,Operating on strings.Subroutine andBugs:Subroutines,Scoping and Subroutines, Commandline * arguments and Arrays, Passing datatoSubroutines,Modules and Libraries of Subroutines.

LabExercise

1.  Write a PERL script to read protein sequence data from a file.

2.  Write a PERL script to search for a motifina DNA sequence.

Unit-5
Teaching Hours:18
THE GENETIC CODE
 

Hashes,Data structure and algorithms forBiology,TranslatingDNAintoProteins,ReadingDNA from the files in FASTA format,ReadingFrames.

GenBank: GenBankfiles,GenBankLibraries,SeparatingSequenceandAnnotation,ParsingAnnotations,Indexing GenBank with DBM. Protein Data Bank: Files and Folders, PDB Files, ParsingPDBFiles.

Lab Exercises:

1. Write a PERLscript to append ACGT to DNA using a subroutine.

2.Case Study:

a. To retrieve the sequence of the Human keratin protein from UniProtdatabase and to interpret the results.

b. To retrieve the sequence of the Human keratinproteinfromGenBankdatabase andtointerprettheresults.

Text Books And Reference Books:

 

[1]   Bioinformatics:MethodsandApplications,S.C.Rastogi,NamitaMendirataandParagRastogi,4thEdition,PHILearning,2013.

[2]   BeginningPerlforBioinformatics,TisdallJames,1stedition,ShroffPublishers(O’Reilly),2009.

Essential Reading / Recommended Reading

[1]   IntroductiontoBioinformatics,ArthurMLesk,2ndEdition,OxfordUniversityPress,4thedition,2014.

[2]   BioinformaticsTechnologies,Yi-PingPhoebeChen(Ed),1stedition,Springer,2005.

[3]   BioinformaticsComputing,BryanBergeron,2ndEdition,PrenticeHall,1stedition,2003.

Webresources:

[1]

http://cac.annauniv.edu/PhpProject1/aidetails/afug_2013_fu/24.%20BIO%20MED.pdf

[2]                                     https://www.amrita.edu/school/biotechnology/academics/pg/introductionbioinformaticsbif410

[3]   https://canvas.harvard.edu/courses/8084/assignments/syllabus

[4]   https://www.coursera.org/specializations/bioinformatics[5]http://www.dtc.ox.ac.uk/modules/introduction-bioinformatics-bioscientists.htmlEvaluationPattern

Evaluation Pattern

CIA50%

ESE50%

MDS373D - EVOLUTIONARY ALGORITHMS (2022 Batch)

Total Teaching Hours for Semester:90
No of Lecture Hours/Week:6
Max Marks:150
Credits:5

Course Objectives/Course Description

 

 

Able to understand the core concepts of evolutionary computing techniques and popular evolutionary algorithms that are used in solving optimization problems.Students will be able to implement custom solutions for real-time problems applicable with evolutionary computing.

 

Course Outcome

CO1: Basic understanding of evolutionary computing concepts and techniques

CO2: Classify relevant real-time problems for the applications of evolutionary algorithms

Unit-1
Teaching Hours:18
INTRODUCTION TO EVOLUTIONARY COMPTUTING
 

 

Terminologies – Notations – Problems to be solved – Optimization – Modeling – Simulation

– Search problems – Optimization constraints

 Lab Program

1.Implementation of single and multi-objectivefunctions

2.Implementation of binaryGA

 

Unit-2
Teaching Hours:18
EVOLUTIONARY PROGRAMMING
 

 

Continuous evolutionary programming – Finite state machine optimization – Discrete evolutionary programming – The Prisoner’s dilemma

EVOLUTION STRATEGY

One plus one evolution strategy – The 1/5 Rule – (μ+1) evolution strategy – Self adaptive evolution strategy

Lab Program

1.Implementation of continuous GA

2.Implementation of evolutionary programming

 

Unit-3
Teaching Hours:18
GENETIC PROGRAMMING
 

 

Fundamentals of genetic programming – Genetic programming for minimal time control EVOLUTIONARY ALGORITHM VARIATION

Initialization – Convergence – Population diversity – Selection option – Recombination – Mutation

Lab Program

1.Implementation of genetic programming

2.Implementation of Ant Colony Optimization

Unit-4
Teaching Hours:18
ANT COLONY OPTIMIZATION
 

ANT COLONY OPTIMIZATION

Pheromone models – Ant system – Continuous Optimization – Other Ant System PARTICLE SWARM OPTIMIZATION

Velocity limiting – Inertia weighting – Global Velocity updates – Fully informed Particle Swarm

Lab Program

1.Implementation of Particle Swarm Optimization

2.Implementation of Multi-Object Optimization

 

Unit-5
Teaching Hours:18
MULT-OBJECTIVE OPTIMIZATION
 

 

Pareto Optimality – Hyper volume – Relative coverage – Non-pareto based EAs – Pareto based EAs – Multi-objective Biogeography based optimization

Lab Program

1.Simulation of EA in Planning problems (routing, scheduling, packing) and Design problems (Circuit, structure,art)

2.Simulation of EA in classification/prediction modelling

 

Text Books And Reference Books:

[1]   D. Simon, Evolutionary optimization algorithms: biologically inspired andpopulation-basedapproachestocomputerintelligence.NewJersey:JohnWiley,2013.

[2]   Eiben and J. Smith, Introduction to evolutionary computing. 2nd ed. Berlin:Springer,2015.

Essential Reading / Recommended Reading

1.  D.Goldberg,Geneticalgorithmsinsearch,optimization,andmachinelearning.Boston: Addison-Wesley,2012.

 

2.  K. Deb, Multi-objective optimization using evolutionary algorithms. Chichester: John Wiley & Sons,2009.

 

3.  R. Poli, W. Langdon, N. McPhee and J. Koza, A field guide to genetic programming. [S.l.]: Lulu Press,2008.

 

4.  T.Bäck,Evolutionaryalgorithmsintheoryandpractice.NewYork:OxfordUniv.Press, 1996.

 

 

 

Web Resources:

 

 1  E.A.EandS.J.E,"IntroductiontoEvolutionaryComputing|Theon-line accompaniment to the book Introduction toEvolutionary Computing",Evolutionarycomputation.org,2015.[Online].Available: http://www.evolutionarycomputation.org/.

 

2  F.Lobo,"EvolutionaryComputation2018/2019",Fernandolobo.info,2018.[Online]. Available:http://www.fernandolobo.info/ec1819.

 

3  "EClabTools",Cs.gmu.edu,2008.[Online].Available: https://cs.gmu.edu/~eclab/tools.html.

 

Evaluation Pattern

CIA 50%

ESE 50%

MDS373E - OPTIMIZATION TECHNIQUE (2022 Batch)

Total Teaching Hours for Semester:90
No of Lecture Hours/Week:6
Max Marks:150
Credits:5

Course Objectives/Course Description

 

 

 This course will help thes tudents to acquire and demonstrate the implementation of the necessary algorithms for solving advanced level Optimization techniques.

Course Outcome

CO1: Apply the notions of linear programming in solving transportation problems

CO2: Understand the theory of games for solving simple games

C03: Use linear programming in the formulation of the shortest route problem.

CO4: Apply algorithmic approach in solving various types of network problems

CO5: Create applications using dynamic programming.

Unit-1
Teaching Hours:18
INTRODUCTION
 

INTRODUCTION

OperationsResearchMethods- SolvingtheORmodel-QueuingandSimulationmodels

– Art of modelling – phases of OR study.MODELLINGWITHLINEARPROGRAMMING

Two variable LP model – Graphical LP solution – Applications. Simplex method andsensitivityanalysis–Dualityandpost-optimalAnalysis-Formulationofthedualproblem.LabExercise

1.  SimplexMethod

2.  DualSimplexMethod

                                                                                          

Unit-2
Teaching Hours:18
TRANSPORTATION MODEL
 

TRANSPORTATIONMODEL

DeterminationoftheStartingSolution–Iterativecomputationsofthetransportationalgorithm.AssignmentModel:TheHungarianMethod–Simplexexplanationof theHungarianMethod–Thetrans-shipmentModel.

LabExercise

1.  BalancedTransportationProblem

2.  UnbalancedTransportationProblem

3.  AssignmentProblems

Unit-3
Teaching Hours:18
NETWORK MODELS
 

NETWORKMODELS

Minimal Spanning tree Algorithm – Linear Programming formulation of the shortest-routeproblem. Maximal Flow Model: Enumeration of cuts – Maximal Flow Diagram – LinearProgrammingFormulationofMaximalFlowModel.

CPMandPERT

Network Representation– Critical PathComputations –Constructionofthetime Schedule

–LinearProgramming formulationofCPM–PERTnetworks.LabExercise:

1.Shortest path computations in a network

2.Maximumflowproblem

Unit-4
Teaching Hours:18
GAME THEORY
 

GAMETHEORY

Strategic Games and examples-Nash equilibrium and examples-Optimal Solution of two person zero sum games-Solution of Mixed strategy games-Mixed strategy Nash equilibrium-Dominated action with example.

GOALPROGRAMMING

Formulation–Tax Planning Problem–Goal Programming algorithms–Weights method

Preemptive method.

LabExercise:

1.  CriticalpathComputations

2.  GameProgramming

Unit-5
Teaching Hours:18
MARKOV CHAINS
 

MARKOVCHAINS

Definition–Absolute and n-stepTransition Probability–Classification of states.DYNAMIC PROGRAMMING

Recursivenature of computation in Dynamic Programming – Forward and Backward Recursion – Knapsack / Fly Away / Cargo-Loading Model – Equipment ReplacementModel.

LabExercise:

1.  GoalProgramming

2.  DynamicProgramming

 

 

Text Books And Reference Books:

 

1.  HamdyATaha,OperationsResearch,9thEdition,PearsonEducation,2012.

2.Garrido José M. Introduction to Computational Models with Python. CRC Press,2016.

 

Essential Reading / Recommended Reading

1.  RathindraPSen,OperationsResearch–AlgorithmsandApplications,PHI LearningPvt. Limited, 2011

 2.  R.Ravindran,D.T.PhilipsandJ.J.Solberg,OperationsResearch:PrinciplesandPractice,2nded.,JohnWiley& Sons,2007.

 

3.  F.S.HillierandG.J.Lieberman,Introductiontooperationsresearch,8thed.,McGraw-HillHigherEducation,2004.

 

4.  K.C.Rao andS.L.Mishra,Operationsresearch,AlphaScienceInternational,2005.

 

5.  Hart, William E. Pyomo: Optimization Modeling in Python. Springer, 2012.

 

6.MartinJ.Osborne,AnintroductiontoGametheory,OxfordUniversityPress,2008

Evaluation Pattern

CIA- 50%

ESE-50%

MDS381 - SPECIALIZATION PROJECT (2022 Batch)

Total Teaching Hours for Semester:60
No of Lecture Hours/Week:4
Max Marks:100
Credits:2

Course Objectives/Course Description

 

The course is designed to provide a real-world project development and deployment environment for the students.

Course Outcome

CO1: Identify the problem and relevant analytics for the selected domain.

CO2: pply appropriate design/development strategy and tools.

Unit-1
Teaching Hours:60
Specialization Project
 

Project will be based on the specialization domains which students are opted for during this semester.

Text Books And Reference Books:

[1]. Statistics : An Introduction Using R, Michael J. Crawley, WILEY, Second Edition, 2015. Recommended References

 

Essential Reading / Recommended Reading

[1].Hands-on programming with R, Garrett Grolemund, O’Reilley, 1st Edition, 2014

[2]. R for everyone, Jared Lander, Pearson, 1st Edition, 2014

Evaluation Pattern

CIA 50%

ESE 50%

MDS481 - INDUSTRY PROJECT (2022 Batch)

Total Teaching Hours for Semester:30
No of Lecture Hours/Week:2
Max Marks:300
Credits:12

Course Objectives/Course Description

 

 This course helps the student to develop students to become globally competent and to inculcate Entrepreneurial skills among students.

 

Course Outcome

CO1: Develop Real time Projects

CO2: Practices different data science principles and strategies in the project

Unit-1
Teaching Hours:30
Project Work
 

 

 

It is a full time project to be taken up either in the industry or in an R&D organization.

 

Text Books And Reference Books:

-

Essential Reading / Recommended Reading

-

Evaluation Pattern

CIA 50%

ESE 50%